Learning Bilingual Lexicons from Monolingual Corpora

نویسندگان

  • Aria Haghighi
  • Percy Liang
  • Taylor Berg-Kirkpatrick
  • Dan Klein
چکیده

We present a method for learning bilingual translation lexicons from monolingual corpora. Word types in each language are characterized by purely monolingual features, such as context counts and orthographic substrings. Translations are induced using a generative model based on canonical correlation analysis, which explains the monolingual lexicons in terms of latent matchings. We show that high-precision lexicons can be learned in a variety of language pairs and from a range of corpus types.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Indonesian-Chinese Lexicon with Bilingual Word Embedding Models and Monolingual Signals

We present a research on learning Indonesian-Chinese bilingual lexicon using monolingual word embedding and bilingual seed lexicons to build shared bilingual word embedding space. We take the first attempt to examine the impact of different monolingual signals for the choice of seed lexicons on the model performance. We found that although monolingual signals alone do not seem to outperform sig...

متن کامل

XRCE Participation in CLEF 2002

In this paper, we describe the methods we used for the Cross-Lingual Evaluation Forum CLEF 2002, and more specifically for the GIRT Task. The methods are based on (1) the extraction of two bilingual lexicons, one from parallel corpora and the other one from comparable corpora, (2) the optimal combination of these bilingual lexicons in Cross-Language Information Retrieval and (3) the combination...

متن کامل

On the Role of Seed Lexicons in Learning Bilingual Word Embeddings

A shared bilingual word embedding space (SBWES) is an indispensable resource in a variety of cross-language NLP and IR tasks. A common approach to the SBWES induction is to learn a mapping function between monolingual semantic spaces, where the mapping critically relies on a seed word lexicon used in the learning process. In this work, we analyze the importance and properties of seed lexicons f...

متن کامل

Metalinguistic Awareness and Bilingual vs. Monolingual EFL Learners: Evidence from a Diagonal Bilingual Context

This paper reports a study of 85 Iranian EFL learners in the English Language Department of Urmia University. It explores the possible differences between performance of 38 Persian monolingual and 47 Turkish-Persian bilingual EFL learners on metalinguistic tasks of ungrammatical structures and translation. The underlying hypothesis is that bilinguals in diagonal bilingual contexts experience a ...

متن کامل

Bilingual Lexicon Construction Using Large Corpora

This paper introduces a method for learning bilingual term and sentence level alignments for the purpose of building bilingual lexicons. Combining statistical techniques with linguistic knowledge, a general algorithm is developed for learning term and sentence alignments from large bilingual corpora with high accuracy. This is achieved through the use of ltered linguistic feedback between term ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008